Better Data Beat Big Data
نویسندگان
چکیده
Generalizability of models of student learning is a highly desirable feature. As new students interact with educational systems, highly predictive models, tuned to increasing amounts of data from previous learners, presumably allow such systems to provide a more individualized, optimal learning path, give better feedback, and provide a more effective learning experience. However, any large student/user population will be heterogeneous and likely consist of discernable sub-populations for which specific models of learning may be appropriate. Student subpopulations may differ with respect to cognitive factors, the level and quality of instruction, and many other environmental and noncognitive factors. The era of both “big data” and widely deployed educational software, including Carnegie Learning’s Cognitive Tutor (CLCT) intelligent tutoring system, presents opportunities to analyze increasingly large volumes of data collected during learners’ interactions with educational systems. These data cover a broad spectrum of learners, allowing researchers to investigate the structure of an increasingly representative student population. In this work, we investigate discovering student sub-populations from “big data.” Using a year’s worth of data from CLCT, we test the hypothesis that commonly used stratifications of student subpopulations (e.g., school location, socio-demographic factors) offer ways to meaningfully partition learners. We discover that, rather than finding distinct subpopulations that should be treated differently, a particular sub-population of learners provides especially “high quality” data and that models learned from this sub-population outperform all other models even when predicting student learning for the sub-population on which other models were trained. In this way, “better data beat big data.”
منابع مشابه
Survey on Perception of People Regarding Utilization of Computer Science & Information Technology in Manipulation of Big Data, Disease Detection & Drug Discovery
this research explores the manipulation of biomedical big data and diseases detection using automated computing mechanisms. As efficient and cost effective way to discover disease and drug is important for a society so computer aided automated system is a must. This paper aims to understand the importance of computer aided automated system among the people. The analysis result from collected da...
متن کاملA Multi-model Approach to Beat Tracking Considering Heterogeneous Music Styles
In this paper we present a new beat tracking algorithm which extends an existing state-of-the-art system with a multi-model approach to represent different music styles. The system uses multiple recurrent neural networks, which are specialised on certain musical styles, to estimate possible beat positions. It chooses the model with the most appropriate beat activation function for the input sig...
متن کاملFeature Selection in Structural Health Monitoring Big Data Using a Meta-Heuristic Optimization Algorithm
This paper focuses on the processing of structural health monitoring (SHM) big data. Extracted features of a structure are reduced using an optimization algorithm to find a minimal subset of salient features by removing noisy, irrelevant and redundant data. The PSO-Harmony algorithm is introduced for feature selection to enhance the capability of the proposed method for processing the measure...
متن کاملImplementation of Random Forest Algorithm in Order to Use Big Data to Improve Real-Time Traffic Monitoring and Safety
Nowadays the active traffic management is enabled for better performance due to the nature of the real-time large data in transportation system. With the advancement of large data, monitoring and improving the traffic safety transformed into necessity in the form of actively and appropriately. Per-formance efficiency and traffic safety are considered as an im-portant element in measuring the pe...
متن کاملBig Data Quality: From Content to Context
Over the last 20 years, and particularly with the advent of Big Data and analytics, the research area around Data and Information Quality (DIQ) is still a fast growing research area. There are many views and streams in DIQ research, generally aiming at improving the effectiveness of decision making in organizations. Although there are a lot of researches aimed at clarifying the role of BIG data...
متن کامل